perm filename BAZZ[E,ALS] blob
sn#141917 filedate 1975-01-24 generic text, type T, neo UTF8
I. Introduction
A. Motivation is optimization - most of time is spent in function calls and
setting them up, calling sequences, linkages, arbitrary order of evaluating
arguments.
B. Diatribe on correctness
1. distinction between fact and intent
2. gcd example
C. Definition of the concept of an algorithm
1. tie in notion of equivalence
2. equivalence is unsolvable in general case
3. equivalence is subject initially to constraints imposed by
a. flow analysis - worst that could happen approach
b. understanding of assembly language - if every operation in
the LAP program is understood, then equivalence can be
determined
c. definition of effects of functions - i.e. duplicate occurrences
(e.g. no decoupling of result and effect in PUTPROP while
yes for SET, SETQ, RPLACA, and RPLACD)
D. An analysis of the properties of a programming language
1. semantics of primitive operations
2. semantics of primitive data types
E. Why use LISP?
1. a useful mechanism for describing an algorithm
2. case analysis
3. close relationship to a machine independent form (canonical form)
F. Examples of equivalences that can be detected by the system
1. See Chapter 1 of Winograd and or Sussman for an example of how
to gently give a flavor of the power of a system. By power it
is meant an illustration of the type of equivalences and
optimizations that can be recognized.
2. Use various encodings of NEXT from compiler generated to Savitsky
algorithm with several intermediate steps such as my optimized
version and JRL's version. Encodings should be annotated with
a register transfer like description so that reader will not be
baffled by LAP.
3. Cap off by showing that system can not recognize a reformulation of
the algorithm on a higher level. This is difference between
Algorithms 1 and 2.
4. Demonstrate the ability of the system to pinpoint errors and to
indicate the approximate location of the error causing operation.
Note the fact that currently correction is not attainable (point
out that this is a direction for future work - maybe too early?).
5. At present HIER1 is best example of error detection and correction.
Need to find a simpler example with a greater familiarity - i.e.
the meaning or purpose of the function should be clear.
G. Type of equivalence handled
1. refer to concept of algorithm
2. loop unrolling without redundancy - use an example with REVERSE or
even better NEXT. This will show that in a sense we define
a new function where the initial conditions are tested at the
end. Draw the analogy to a DO UNTIL loop in ALGOL.
3. equivalence between various assembly language encodings of a specific
algorithm by reducing them to a common form - i.e. the canonical
form. The equivalence is shown by transitivity.
H. Basis of canonical form is paper by McCarthy -
"Towards a Mathematical Theory of Computation"
I. Comparison with Decompilation
1. distinction is no pattern matching
2. system is driven by an understanding system for assembly language
3. no use of or search for a syntax in the assembly language code for
the function being worked on (a la Hollander).
4. no use of pattern matching
5. decompiling systems (i.e. Housel) have generally verified the
correctness of the decompilation process by a test of the
decompiled function by trying it out with data. Our system
yields equivalence under all cases since the intermediate form
is derived by a symbolic execution of the assembly language
program under all possible conditions.
6. We do not perform decompilation because compilation is a 1-many
operation and likewise for decompilation. Whereas decompilation
systems use the original source program as the target or measure
of their success, we use an intermediate form which merely reflects
all of the computations performed.
7. That no decompilation is going on, can be quickly seen by the two
encodings of HIER1 or even NEXT.
8. ACM 74 notes from session on decompiling (Nov. 13,1974)
J. Subset of LISP handled
1. EXPRs and FEXPRs
2. No PROGS
a. if no GO, then a PROG is equivalent to an EXPR
b. GO can be incorporated by use of pseudo functions - i.e.
break down the function into subparts where each part
is between two labels.
3. RPLACA and RPLACD operations
4. Needless to say only compiled LISP is handled. This precludes the
assoc list mechanism of determining variable values. In fact
variable names only exist for global variables which have been
declared to be SPECIAL initially. Local parameters - i.e. the
formal parameters, have names which do not exist anywhere outside
of the function to which they are local and cannot be accessed
by name in functions called by the function to which they are
local.
5. SPECIAL variables declared on the outer level (see 4).
6. SETQ to local and SPECIAL variables.
7. Internal LAMBDA - i.e. common subexpressions
K. Possible use of the system in the future
1. Back end of an optimizing compiler - i.e. verify correctness of code
generated.
2. A debugging system for compilers
3. Computer aided instruction for assembly language programming.
L. Distinction between proving a compiler correct and our proofs for
individual programs. The general problem is quite difficult, and the
current solution is quite satisfactory at a reasonble cost.
M. Importance of machine description and independence of the proof procedure
and the host machine for the assembly language. Conceivably, a different
computer's instruction set could be plugged into the system with relative
ease.
N. Relationship to previous correctness apporaches
1. Assertions - difficulty is in mechanizing the creation of assertions.
Moreover, this concept is basically tied to the "intent" school
of correctness (Feb. 1,1974)
2. Predicate Calculus diatribe (Feb. 23,1974)
3. Data base of the rederivation process (this is name given to the
procedure of going from assembly language to an intermediate form)
is basically a model of all of the computations performed up
to the current instruction. At each encounter of an instruction,
the current data base serves as a precondition, the instruction
as an operator, and the new data base as the postcondition. This
leads us to view the data base as an assertion about the
computation (vague and maybe unnecessary?).
4. Do not use input output pairs to show equivalence (this is basically
how Housel showed that his decompilation is correct). We feel that
this is too limiting since a proof is only good for its data.
O. The rederivation procedure works for compiler generated code as well as
handcode subject to constraints such as:
1. Results of a function are returned in a specific location
2. Arguments to a function are found in specified locations
3. The function is limited as to the parts of memory it can access
a. program
b. registers
c. stack
4. Note that the restrictions basically pertain to calling sequences and
protection.
P. System can also be viewed as an understanding system for assembly language
Q. Full analysis of implications of EQ, EQUAL, and ATOM
R. Equivalence is proved by use of a normal form rather than a canonical form
S. Correctness of optimization is a valid motivation for this project since a
measurement system showed that optimizations such as those the system
is capable of recognizing lead to significant time and space improvement
(see HIER1). In fact, one of the results of this study is that the
best calling sequence is not a stack or registers, but a combinations of
the two. This is where confusion starts and the need for a system such
as the one proposed becomes more acute.
T. Relation to structured programming (see Knuth notes Sept. 24,1974) - i.e. LISP
is a structured programming language and we show equivalence through use of
suitable transformations.
U. Flow analysis poses limitations on the knowledge of the effects of functions.
V. Literature Survey
1. McCarthy
2. Floyd
3. King
4. Allen
5. Hollander
6. Housel
7. Schneider
8. Optimization via syntactic rather than semantic methods (Oct. 26,1974)
W. One of the main reasons for the success of the system is the proper delegation
of tasks between the proof procedure and the rededrivation procedure.
Some examples are:
1. β-reduction this is a fancy name for the process of substituting the
evaluated bindings of internal LAMBDA variables for the occurrences
of the variables in the original program. The numerical
representation of the function enables the adherence to call by
value despite this substitution.
2. α-conversion this is a fancy name for renaming of variables. In
our case this occurs whenever we have a SETQ to a local variable,
we replace the local variable by its bound value. This is valid
since the name has no meaning. Actually, the distinction between
our use of β-reduction and α-conversion is hazy and should
probably be scrapped.
3. ∂-reduction is exhibited by the avoidance of computing expressions and
conditions whose values are already known. The most common
example of this situation is the replacement of (T→p,q) by p
and likewise for (F→p,q) by q. This technique is applied in both
the rederivation procedure and the proof procedure.
4. The importance of 1. and 2. lies in the absence of names in the
asssembly language program. Thus we are able to reconcile the
absence of names in the rederived form with their use in the
intermediate form.
5. Distinction between value and effect of a computation (SETQ, RPLACA,
RPLACD)
6. Jumps not to the start of the program and loop unrolling
a. proof that start can be bypassed is done in rederivation
b. proof that loop unrolling is OK even though condition
rearranging has taken place is done in proof procedure.
Specifically, note the use of manipulating the canonical
form and rederived forms at the appropriate times to
yield most "effective" proof method.
7. Others?
X. A function has two representations:
1. symbolic
2. numeric
Y. We deal with the problem of equivalence of two encodings of an algorithm.
We do not focus
any attention to the correctness of the algorithm. However, in the
process of proving the equivalence of two algorithms, we have in fact
proved the correctness of the translation process from one encoding of
the algorithm to the other. The translation process is shown to be
correct for one specific case. Note that we can not infer the e
correctness of the translation process in general from this result.
However, this type of a procedure is useful in proving that a compiler
is correct since it need only be shown correct for the inputs supplied
to it. This point is often overlooked by other workers in the field.
In a sense it could be said that we have bootstrapped ourselves to
the state where we can attribute a state of "effective" correctness
to a compiler by merely demonstrating that its output is equivalent
to its input - a rather powerful step!
Z. In the future PROGs can be implemented along two avenues
1. Break up function into subunits of code between labels and CONDs
(pseudofunctions). Compile the pseudofunctions individually.
2. Use pseudofunctions but rederivation procedure when encounter a
label seen previously to which a redundant path can not be found
makes sure that all locations referenced in the future have
valid contents (more later). Problem is with associating names
with the locations (see October-November 1973 notes).
AA. Peter Deutsch - December 14,1972: "If a program can't be verified, then either
it is wrong or one has inadequately specified the assertions." Feels that
this is all that we can ask from the system. Deutsch proves correctness
by means of assertions.
AB. Conclusion: one of main outgrowths of work is impetus it provided for
scrutinizing LISP. End result is the definition of CMPLISP and
implementation - protection. Concept of Free variable is vastly overblown -
global is sufficient as evidenced by the system {similarly for GO to
backward locations}.
AC. System provides a framework for proving optimizations correct. An interesting
future work is when some optimization cannot be handled, explain why.
AD. Proposal for Research: Implement CMPLISP, an optimizer, inter function proofs,
and extension to Interlisp?
Rough Outline
I. Introduction - demostrate power and ideas - go through various optimization
examples (can use SYNPUB examples as well as various NEXT encodings) and
show types of optimizations we can prove equivalent.
II. Perspective - Literature Survey, and discusions on correctness, optimization,
and decompilation.
III. CMPLISP
IV. Canonical form and transformation of CMPLISP programs to it.
V. Rederivation - machine description, understanding system, problems with
equality, etc.
VI. Matching
VII. Examples
A. REVERSE
1. canonical form
2. rederived form
3. proof of equivalence
B. NEXT
1. canonical form
2. rederived program
3. proof of equivalence
C. HIER1
1. LISP encoding
2. unoptimized LAP
3. optimized LAP
4. optimized LAP with errors
a. point out errors
b. show how system fails to prove equivalence to LISP by
showing error messages and annotate them - not too
much detail (just discuss the type of error)
VIII. Conclusions and Future Directions
A. indicate strong points - i.e. what realizations enable us to derive
power
B. future
1. allow recursive calls to points past the start other than
self - i.e. closure mechanism in EL1 (Nov. 30,1973)
2. would like a general mechanism of inputting relations between
fucntions in terms of genral variables. This can only be
handled as is presently done by creating instances of the
identities whenever they are true and entering them in the
data base. Problem is that data base becomes too big - OK
for limited identities. More difficult for rules which
involve uses of several functions - i.e. associativity would
result in the creation of a large number of relations - i.e.
coupled with commutativity a*b*c would have 8 relations.
3. better or more general equality algorithm
4. use of analysis of EQ, EQUAL, and ATOM corresponds to a theorem
prover for LISP. Would like to extend to bit manipulations
as well as arithmetic and ?. Basically, this depends on the
domain of the programs. For arithmetic we only have
commutativity. Clearly, would also like associativity.
Need a mechanism for declaring equalities in schema form
and recording them in such a manner rather than present
recording of all instances. For example, don't know about
associativity of APPEND.
5. Description of additions to the computation model - i.e. suppose
a parameter stack is used, then system will still work. The
addition is a data structure PSTACK (Interlisp uses a
parameter stack). By addition we mean that program can
translate input constraints such as preservation of
registers or stack upon a function call into code.
6. Environment other than LISP - see Fibonacci in SYNPUB with a
regular number representation rather than LISP.
7. Can use system to mark instructions that never get executed.
With this knowledge, the space required can be reduced
and furthermore, the code can be improved as a result of
such information. See Nievergelt (Feb. 1973) and redundant
code notes of Nov. 5,1974.
8. DWP comments in notes of Oct. 28,1974.
IX. Appendix describing LAP
A. LAP - include CALLF, JCALLF
B. instructions
1. verbal
2. program
3. primitives not previously explained
C. Detectable errors in LAP (give a listing of error messages from
emiterror)
X. Manual to show that system really works
XI. Bibliography
I. Introduction
A. indicate that all numbers used in examples and elsewhere are in OCTAL (base 8)
B. Examples
1. introduce NEXT - algorithm 1 (use SYNPUB discussion)
a. compiler code
b. handcode
c. equivalence
2. algorithm 2 for NEXT - describe features of various encodings
a. compiler code
b. handcode
c. JRL
d. Savitsky
e. equivalence
3. REVERSE - one argument
a. compiler
b. handcode
4. REVERSE - two argument
a. compiler
b. handcode
c. RPLACA
i. incorrect (in line RPLACA without checking ATOM)
ii. correct
d. incorrect - do I have one? (maybe wrong stack order)
e. interchanging first with second argument
i. incorrect (HRRZS@)
ii. correct
5. discussion
a. 1 to 2, 3 to 4, and 4 to 4e represent user manipulations of
assembly code once the optimizer has done its work.
b. mention stepwise refinement
c. like Knuth's "Structured Programming with GO TO" and see my
notes
d. these transformations show need for correctness
6. optimizations
a. inner loop faster
b. order of argument evaluation rearranged
c. make use of results of tests (algorithm 2 for NEXT)
d. bypass start of program on recursive call
e. calling sequence
7. what the system can and cannot do
a. cannot convert algorithm 1 to algorithm 2 for NEXT
b. cannot convert non list modifying to list modifying for REVERSE
c. One argument to two argument REVERSE cannot be currently done
although we could make use of schema like transformations as
shown in SYNPUB for REVERSE. similarly, for FIBONACCI.
d. cannot rearrange order of parameters for a function as 4e
e. can detect errors
8. maybe use other examples in SYNPUB - i.e. SUBB
9. conclusion
a. we provide a framework for proving optimizations such as these
correct
b. current scenario envisions the user sitting at a terminal and
interactively optimizing his program with simultaneous
proofs
c. good for speeding up inner loops as shown for NEXT and REVERSE
d. in future build into an optimizer or incorporate with it
e. we are able to handle all occasions where the basic algorithm
does not change
IV. Canonical Form (actually normal)
A. JMC definitions and MATCMP paper
1. defines a canonical form
2. becomes a normal form when introduce knowledge of equality (EQ)
3. JMC proves existence of canonical form for gbf's
4. we make extensions to incorporate LISP
5. axioms become qualified by use of flow analysis information
B. Inclusion of side-effects
1. need a representation in terms of order of computation
2. numbering algorithm
3. application of axioms need checks for redundancy
4. flow analysis and limitations - i.e. assume worst can happen
C. Extensions to handle constructs
1. COND
2. Multiple results (FN)
3. Internal LAMBDA
4. NULL
5. AND, OR
6. PROG
D. Distinction between value and effect
1. RPLACA
2. RPLACD
3. SETQ to SPECIAL variables
4. SET (?)
E. Implicit and explicit conditions (part of axiom ?)
1. ATOM
2. EQ
3. EQUAL
F. Maintenance of order of computing test both explicit and implicit - axiom
application still maintains order
G. Conversion of breadth first to depth first - functions must have higher
computation numbers than their arguments.
H. Algorithm F for determining bindings of variables
1. SETQ to local variables and use of variable bindings
2. LAMBDA variables
3. PROG variables
I. Duplicate Predicate Removal Algorithm
1. based on properties of functions in CMPLISP section
2. flow analysis used
3. redundant storage on first instance - see definition in CMPLISP
this means on all paths
VI. Matching
A. why not lexicographic minimum (Feb. 22,1974)
B. loop unrolling and loop economy discussions
C. modifications to REDERIVE - PASS4
1. Powerset of labels branched to
in backward direction will yield one rederived program for
each element of powerset
2. Question: When determining argument values should we use
matching as in the matching phase or stick to redundant
computations. Think about it and try to justify choice.
Answer: Redundant computations is the right solution since
if a computation is repeated in the beginning of a function
as well as prior to the call, then its elimination (or
bypassing) may be wrong. For example, suppose a function
known to read and modify a special variable fits this
criterion. Moreover, the arguments are identical. In this
case, its bypassing would be a mistake. The function could
be of a nature that increments the SPECIAL variable. This
should only be mentioned in Chapter 6 once the distinction
between matching and redundancy is explained.
D. Unspecified parameters
E. Pinpointing errors
1. computations out of order
2. missing computations - a value was computed in the rederived form
which was not specified in the canonical form or out of order.
This could result from
3. indicate when a location was garbaged - storage history field in
data descriptor
4. missing condition
5. mismatching conclusion - the conclusion in the rederived form does
not match the conclusion as specified by the canonical form.
6. missing FN list computation in rederived form (i.e. a computation
not used as a subexpression of a predicate or a result was not
computed by the assembly language program).
7. unknown conclusion in the rederived form resulting from an error in
the assembly language program.
8. anything else?
9. illustrate by examples
F. Matching algorithm and various passes
1. use induction for backward jumps
2. termination
3. Sometimes manipulate canonical form and sometimes rederived form.
a. We take the basic approach that the rederived program represents
a program and we wish to manipulate the canonical form in
legal ways to obtain a match. When this is done, we say
that the programs are equivalent. This will account for
rearranging of conditions as well as the order of performing
computations (i.e. order of argument evaluation). Thus by
finding equivalence preserving transformations we are
proving that the rederived form is equivalent to the
canonical form.
b. We manipulate the rederived form to match the canonical form
when we try to match backward jumps. The canonical form
consists of the already manipulated version. This is
because we will be trying out various versions of
abbreviated rederived forms with induction.
c. The above is necessary especially when conditions have been
rearranged. Problem here is that if we use the original
canonical form, then conditions have not been rearranged.
However, when we expand recursive call mismatches we use
the rederived forms plugged into the mismatching function
calls in the canonical form since by induction, the
canonical form has already been reordered to correspond
to the order implied by the rederived form.
G. Proof procedure is machine independent and likewise for intermediate
representation.
H. Matching
1. no backward jumps
a. match rederived against canonical manipulating canonical
b. numbering
c. computations
i. explicit
ii. implicit
d. conditions
i. EQ, ATOM
ii. EQUAL and check redundant since EQUAL is not primitive
e. termninal node
I. Give NEXT as an example of proof
A. definition of matching
B. matching algorithm
C. loop unrolling versus loop economy
D. changes to rederivation process
E. modification for backward jumps
F. type of errors that can be detected
A. Introduction
Once the original LISP program has been converted to the canonical form and the
LAP program converted to the rederived form, we are ready to attempt to prove the
equivalence of the two. From previous discussions and examples it should be clear
that there is no valid reason for assuming that the outputs of the two procedures
are identical (symbolically and to a lesser degree numerically).
One possible solution is to find some representation of the two forms which is in
terms of a lexicographic minimum. This would remove the problem that is associated
with the use of substitution of equals for equals. Such a solution would
imply that we could find a base representation for the rederived form and likewise
for the canonical form and then simply perform a strict equality match. However,
such a solution fails to take into account the possibility of condition rearranging
which does not preserve the lexicographic ordering property. For example, consider
***Feb. 22, 1974 discussion***
Moreover, the lexicographic minimum does not solve problems associated with
rearranging the order of computation of various functions as well as deal with
the concept of recursion via jumps to points past the start of the program.
In the process of proving equivalence we will have to show that neither of the
canonical and rederived forms reflect computations not performed in the other. We
have already addressed ourselves in part to this problem in the previous chapters
when we discussed the removal of duplicate predicates and computations from the
canonical and rederived forms. Recall that at the end of the rederivation
and canonical form procedures, we removed all redundant first instances of
assignment. This was needed to insure that assignments into elements of the List
Structure were not done for the purpose of temporary storage of results - i.e. the
location could not be subsequently referenced with the specific value as its
contents. Occurrences of functions in one form and not in the other will lead
to inequivalence unless the functions are of a primitive nature since they can be
introduced at will.
For example, consider the case of a computation of CAR(<expr>) which
can not be matched even though CDR(<expr>) was matched or vice versa. In
this case we currently say that inequivalence is the case since a
computation is performed in one form and not in the other. Actually,
we should say that if CAR (or CDR) is legal then so is CDR (or CAR).
In other words the operation becomes primitive in such a case. This is
because no side effect can occur - i.e. recall that the problem was that CAR
and CDR are not defined when their argument is non-atomic. But the act of
computing one of one them safely, implies that the other can also be
computed safely. This problem is quite common in a LISP implementation
where a word represents more than one LISP entity - i.e. a pointer to CAR
in the left halfword and a pointer to CDR in the right halfword. In such a
case we may access an entire word when in fact we only desire a specific half.
A similar consideration is if ATOM(A) is known not to be true, then CAR(A) and
CDR(A) are also safe provided that they are performed after the ATOM test.
One solution is to place CDR(A) on the FN list (and on UNREFERENCED) whenever
CAR(A) is seen and similarly for CAR(A) and CDR(A). Also place CAR(A) and
CDR(A) on the FN list (and on UNREFERENCED) whenever ATOM(A) is known to be
NIL (i.e. false). The only problem is that when we remove redundant first
instances of computation (see canonical form and also prostprocessing stage
of rederived form) involving modification of heads or tails of elements of the List
Structure, and if an
access to the head or tail appears as an argument in a non-result slot to an
FN construct, then it is not considered as an access operation.
We propose a solution for a CAR(A) operation that cannot be matched. This
procedure is applied prior to removing redundant first instances of a
computation involving modification of heads or tails of elements of the List
Structure at the end of the canonical form process described in
Chapter 3 and during the postprocessing stage of the rederivation process.
It should be noted that the same solution would hold for CDR(A)
(actually need to replace CAR by CDR and CDR by CAR in the discussion).
Examine the non-result
arguments of FN constructs for instances of CAR(A). If all such occurrences
appear as non-result arguments of FN constructs, and if CDR(A) appears in
a predicate or in the result of a path containing CAR(A), then CAR(A) can
safely be removed from the FN list. By safely removed we mean that its
computation qualifies as a primitive operation. We leave the case of ATOM
for future work although it is clear that it would be handled in much the
same manner.
Our solution to the equivalence problem is to tranform one form into the other.
We have two choices of operation. We can show via the use
of equivalence preserving transformations (the axioms and substitution) that the
canonical form can be transformed into the rederived form or vice versa. However,
this is not enough; we must also demonstrate that neither form reflects any
computations not performed in the other. Thus we will perform the matching process
twice; once manipulating the canonical form to match the rederived form, and next
manipulating the rederived form to match the result of the previous manipulation of
the canonical form.
The remainder of the chapter expands on the previous notions. We first present
the matching algorithm in terms of transforming the canonical form to match the
rederived form. We next discuss the issues relevant to proving equivalence when
recursion has been implemented via jumps to points other thatn the start of the
program. This is coupled with modifications to the rederivation process as well
as the matching procedure just presented. Finally, the types of errors that can
be discovered are briefly discussed.
***has matching been defined yet?***
The matching procedure consists of manipulating the canonical form to obtain an
identical form to the rederived form. This is done by examining the rederived
form in order of increasing computation numbers and finding what will be known as
"matching" computations. We use a variant of the canonical form algorithm for
strong equivalence described in Chapter 4. The difference is that in addition to
predicates being rearranged via axiom (8), the two forms may differ because
computations may be computed out of sequence as well as the use of substitution
of equals for equals. Thus we see that the concept of inevitable predicate must be
extended to include all computations. Basically, for each computation performed in
the rederived form, we must ensure that it is also computed in the canonical form.
This means that if a computation appears in a condition (non-terminal node) in the
rederived form, then it must either appear in the corresponding condition in the
canonical form, or in each of the subtrees of the unmatched canonical form. Of
course the same criterion holds for computations in terminal nodes.
Recall that the numerical representation of a function serves to indicate a relative
ordering between the various computations in terms of their instance of computation.
Thus the magnitudes of the numbers are seen to be insignificant. We will make use
of the following numbering scheme which will be seen to have certain
desirable properties. In order to perform the matching process we start with all
computations in the canonical form having higher numbers than those in the rederived
form. The only exception are atoms (including local parameters but excluding
SPECIAL variables) which are assigned identical computation numbers in the two
representations.
Each computation in the rederived form must be proved to be inevitable in the
canonical form. This means that the computation or its equivalent must be shown to
be computed in
the canonical form. Moreover, its instance of computation in the canonical form
must be shown to yield an equivalent result to the value computed in the rederived
form. This proccess is termed finding a matching instance. The criteria for
matching are given below.
1. If the function reads and modifies a SPECIAL variable, then there must be no
intervening reading or modification of the said variable.
2. If the function only reads a specific SPECIAL variable, then the said variable
must have the same values prior to the two instances of the function.
3. If the function only modifies a specific SPECIAL variable, then there must be no
intervening reading or modification of the said variable.
4. If the function reads and modifies heads (or tails) of elements of the List
Structure, then there must be no intervening reading and modification of the said
part.
5. If the function only reads heads (or tails) of elements of the List Structure,
then there must be no intervening modification of the said part.
6. If the function only modifies heads (or tails) of elements of the List
Structure, then there must be no intervening reading or modification of the said
part.
7. The function may perform a CONS operation.
The criterion for matching are quite similar to those used to determine duplicate
computatiton. However, there are several differences. Operations resulting in
the modification of SPECIAL variables and the List Structure may also read the
said items. Also an LCONS operation is permissible. Without the previous
differences constructs employing such functions could not be matched up.
The key to the matching procedure is that whenever a computation, say A, is matched
with a computation not previously encountered, say B, then all occurrences of B in
the canonical form are replaced by A. Similarly for the SPECIAL variables modified
by A - i.e. the computation number associated with SPECIAL variables modified by A
replaces the computation number associated with SPECIAL variables modified by B.
This means that during the matching process
all computations in the canonical form with computation numbers higher than those
in the rederived form have not yet been matched. Moreover, at the end of the
procedure we will have managed to transform the canonical form into the rederived
form if equivalence holds.
The process of finding a matching computation is quite straightforward. We process
the rederived form in increasing order of computation number. For each function,
say A, search the canonical form for a matching instance. Recall that all
computations in the canonical form with a computation number less than the number
associated with A have already been matched. The matching instance
can be encountered in two ways. An explicit occurrence is defined to be a function
which has not been previously encountered in the canonical form. An implicit
occurrence is a function which has already been previously encountered when matching
other computations. This results from use of equality information and subsequent
substitution of equals for equals. For example, suppose that we are looking for
(CDR B). (CDR A) has already been computed and matched. In the true subtree of
the condition (EQ A B) we see that (CDR A) is EQ to (CDR B) and thus (CDR A) is a
matching instance of an implicit nature. Thus all that remains is to search the
false subtree for an explicit occurrence of (CDR B) or another implicit equivalent.
As soon as all of the arguments of a predicate, say p, are matched up we check if it
is a primitive function. If yes, then apply the following transformations to the
canonical form, say CAN, corresponding to the part of the rederived form currently
being processed.
1. Replace CAN by (p→CAN,CAN). This corresponds to application of axiom (1).
2. Apply the breadth first to depth first renumbering algorithm to the result of 1.
This is necessary because the conclusion and alternative clauses have the same
numeric representation after application of 1. Moreover, subsequent applications
of matching assume that the canonical form has the properties that all functions
with the same computation number have been computed simultaneously, and that all
computations computed in the left subtree have a higher computation number
associated with them than those computed in the right subtree.
3. Apply the duplicate computation removal algorithm to the result of 2. This
corresponds to the application of axioms (5) and (6) as indicated in the canonical
form algorithm for strong equivalence given in Chapter 4.
We did not need to match primitive predicates because by definition primitive
functions can be considered as inevitable computations. However, if the predicate
does not correspond to a primitive function, then we must insure that it is indeed
inevitable. We choose a different method for proving inevitability. We attempt to
prove that at each terminal node of the canonical form, the predicate is redundant.
If the latter is true, then clearly the predicate is inevitable; otherwise it is
not. The reason for use of such a procedure is best exemplified by the predicate
EQUAL. In this case, if we were to look for an occurrence of the predicate in the
same manner as was done for functions, then the situation may arise that we can not
match it. For example, consider (EQUAL A B) in the rederived form, and a canonical
form consisting of the tree in Figure ? with the left and right side representing
the true and false cases respectively of a predicate.
(ATOM A)
/ \
/ \
(ATOM B) (EQUAL A B)
/ \ / \
/ \ / \
(EQ A B) COMP_3 COMP_4 COMP_5
/ \
/ \
COMP_1 COMP_2
In this example we see that the predicate (EQUAL A B) is redundant by the time
COMP_1, COMP_2, COMP_3, COMP_4, and COMP_5 are performed. However, if we were to
use our matching algorithm to detect matching instances of the predicate, then we
would lose. Note that we have made use of the implications of the various
predicates as outlined in Chapter 3. Actually, our statement about a loss is not
quite correct since a value of the predicate of T or F can be considered as an
implicit occurrence. Nevertheless, we shall stick with our treatment. Finally,
we point out that this example illustrates how problems pointed out in the
discussion of the lexicographic minimum approach to matching do not surface when
equivalence is proved using our matching techniques (recall problem in
distingsuishing between occurrences of EQ and EQUAL as well as making use of
such relationships).
Once a predicate has been matched we proceed to apply the matching procedure to
the conclusion and alternative clauses of the rederived form. Once the entire
rederived form has been processed by the canonical form algorithm, we see that
all computations performed in the rederived form have been matched (i.e. shown
to be equivalent) with computations in the canonical form. However, we have not
yet proved that all computations performed in the canonical form have also been
performed in the rederived form. This can be done by applying the matching
procedure with the roles of the canonical form and rederived form reversed - i.e.
manipulate the rederived form to match the canonical form. This problem will
disappear once we present, in a subsequent section, the method for handling matching
of functions with recursive calls implemented as jumps to points other than the
start of the program.
From the discussion we see that the proof procedure is machine independent. Also
notice how the various properties of the depth first numbering scheme are used to
enable the matching procedure to correspond quite closely to the duplicate predicate
removal process. In fact many of the same routines are used in the implementation.
D. modification for backward jumps
***indicate earlier that we may manipulate the canonical form to match the
rederived form or vice versa***
1. Perform as much matching as possible by manipulating canonical form to match
the rederived form. By as much as possible we mean that if a mismatch occurs on
one path, then process the alternate path in the same manner. The result of the
procedure is a transformed canonical form, say CANRESULT.
2. If no mismatches occurred then we have successfully
managed to prove that the canonical form can be transformed in a legal manner to
match the rederived form.
3. At least one mismatch has occurred. There are several possible causes.
a. Computations were encountered that could not be matched. This could have been
caused by the expansion of a recursive call as in the case of loop unrolling or
by an error. The exact reason is resolved by the remainder of the algorithm.
b. An attempt was made to match a result computation (i.e. a terminal node) in the
rederived form against a condition (i.e. a non terminal node) in the canonical form.
If the condition is a predicate of a primitive nature having identical conclusion
and alternative clauses, then the remainder of the algorithm will apply axiom (1).
Renumber both the original rederived form
and the canonical form resulting from step 1 - i.e. CANRESULT. However, this time
the rederived form receives the higher numbers. This renumbering is the algorithm
mentioned in the previous section. At this point we attempt to manipulate the
rederived form to match the canonical form. Note that this is the reverse of the
procedure applied in step 1 because we wish to detect the exact instance of the
mismatches so that we can determine if they are caused by recursive calls in the
canonical form which have been expanded in the rederived form via loop unrolling.
If all mismatches are of this nature, then proceed to step 4. Otherwise, the
algorithm terminates with a result of inequivalence or an inability to prove
equivalence.
All mismatches that
were caused by case b. with a primitive predicate and identical conclusion and
alternative clauses no longer exist because the matching algorithm applies
axiom (1) followed by axioms (5) and (6). For example, suppose we try to match
(CDR A) with ((EQ (CDR A) F)→F,(CDR A)). In this case, the equality does hold.
If the former is in the rederived form and the latter is in the canonical form,
then when attempting to manipulate the canonical form to match the rederived
form, the matching procedure will fail and declare a mismatch because of case b.
The next attempt at matching will manipulate the rederived form to match the
canonical form. We take advantage of the fact that EQ is a primitive
predicate and that the computations (CDR A) and F are inevitable to replace
(CDR A) in the rederived form by (EQ (CDR A) F)→(CDR A),(CDR A)). The equivalence
should be obvious. Note that no special handling was required; this is simply
part of the matching algorithm since it involves the application of axiom (1).
If in fact the condition was in the rederived form and the result in the
canonical form, then step 1 would not have detected any mismatch since axiom (1)
would have been applied as indicated here.
Step 3 is necessary since in the case of mismatches due to loop unrolling, step 1
will detect a mismatch when a computation is performed in the rederived form and
not in the canonical form or vice versa. Most often, the mismatch is of the
former type and is usually in the condition although it may also be in the result
clause. We give an example of a mismatch occurring because a computation
performed in the canonical form was not performed in the rederived form.
Consider the optimal encoding for the function NEXT. In this case we will
detect a mismatch when attempting to match ((EQ (CDR L) F)→F,(NEXT (CDR L) X))
in the rederived form with (NEXT (CDR L) X) in the canonical form. The first
step is to apply axiom 1 to (NEXT (CDR L) X) in the canonical form since
(CDR L) and F are inevitable computations and EQ is a primitive predicate.
The resullting form is ((EQ (CDR L) F)→(NEXT (CDR L) X),(NEXT (CDR L) X)).
We now must match the conclusion and alternative clauses. The alternative
clause is seen to match; however, the conclusion clause does not match - i.e. F
is not matched by (NEXT (CDR L) X). Application of step 3 will result in an
inability to match (NEXT (CDR L) X) with the condition in the rederived form.
Thus we see that step 3 will indicate if the cause of the mismatch was loop
unrolling of a recursive call.
***indicate that examples neglected to mention numeric representation***
Still left:
1. Subset of CMPLISP handled does not include PROGs (can include PROGs with a
restricted non-backward GO).
2. modify EQUAL.PUB to include ATOM (significant update)
3. diagrams of environment in CHAP3.PUB
4. DERIV6 and CANON6 changes:
a. in CANON6 change all use of FUNCTION to QUOTE
b. in CANON6 and DERIV6 change all QUOTE to QUOTEATM if argument is an
atom and to QUOTELST if argument is a list.
This is done upon input (GETINPUT in DERIV6 and SUBSTITUTE in
CANON6). Also change OUTPUTLAP in DERIV6 to convert QUOTEATM
and QUOTELST back to QUOTE for outputting the lap code.
c. whenever test for QUOTE we should test for MEMBER in QUOTED where
QUOTED←<'QUOTEATM,'QUOTELST>
d. QUOTELST has the 'LCONS property
e. make sure parse routines know about QUOTED
f. zero computation numbers are still assigned to the components of
the argument to QUOTE
g. PRUNEFN and CHECKREDUNDANT no longer mess around with QUOTE
h. check all other uses of QUOTE
5. Implement SET in DERIV6 and CANON6 (major piece of work).
6. Calls to FEXPRs are not yet implemented in DERIV6 and CANON6 and will not be
included in the thesis - left for future work (only QUOTE is implemented).
7. Check last part of CHAP3.PUB to make sure mention of all bits are 1 for atom is
not out of place. Basically, want to eliminate reference to NIL being 0
internally except when define new property list format.
8. CMPLISP means that REMPROP, PUTPROP, REMOB, . . . do not have list modification
effects. Therefore must change for flow analysis.
9. No need for termination test in flow analysis.
10. In CANON6 STEP3ASSIGN: (SETQ A (SETQ B C)) is not done right - must check if
second argument is SETQ and if yes, then check again . . . until get
binding. This is only necessary in case inner SETQs are to SPECIALs since
otherwise SETQ disappears.
11. Algorithm for binding variables in CANON6 must be changed to reflect new
correct method. Basically, need a third representation, the binding
representation. Also, no longer need SETL, DOL for LAMBDA handling.
12. Flow analysis must not consider (RPLACA <special variable> <binding>) as
an access operation on the SPECIAL <special_variable>.
13. In CHAP4.PUB should probably give definition of a tree (root, left subtree,
right subtree). Mention that a LISP program is like a lattice in numbering
scheme. Canonical form is not unique if consider substitution of EQ items
for other EQ items (mention that uniqueness can occur via use of
lexicographic minimum but not worthwhile since matching phase will change
to fit - should go in second paragraph of Adaptation to LISP).
14. Emit a GENSYM for 'FLAGS - i.e. CSYM(FLAGS000) initially. Must make sure that
equality check works OK for 'FLAGS. I believe yes. Need to fix SUBHALF1 in
case subtract two identical FLAGS. Also in description of CALL, PUSHJ we
need to use GETFLAGs() instead of FORMDATAPOINTER('FLAGS). GETFLAGS
creates a datapointer with a GENSYMed FLAGS value and saves FLAGS counter.
15. No CONS(CAR(A),CDR(A)) EQ A because not true since no hashing - this should be
indicated in CHAP3.PUB.
16. Use NEXTINSTRUCTION() instead of NIL in routine descriptions in Appendix.
17. In CHAP5.PUB when discussing EQSUB in restrictions section note that the
appearance of the construct as a non-result argument to an FN construct
means tha arguments are placed on FN list. I believe that this is currently
done. DERIV6 should also be modified to place the operands to operations
which yield results of type UNKNOWN on UNREFG if they have not been
previously referenced. This should probably be mentioned in CHAP5.PUB .
This must be done so that we don't lose track of some LISP computations.
18. CHAP3.PUB has an error in section on when two items are EQ. Specifically, if
the function does modification of heads (tails) then it cannot access heads
(tails) and there must be no intervening modification of heads (tails).
We leave the error uncorrected to see if it will be caught.
19. CHAP5.PUB mention in measurement discussion comments about effect of stack size
on garbage collection and othe comments in Knuth notes (Sept. 24,1974).
20. Super LISP by John Allen notes (Sept. 26,1974) has a blurb about COND not being
a function and also about a characterization of functions in terms of total
and partial.